201 research outputs found
Cross-Entropy Clustering
We construct a cross-entropy clustering (CEC) theory which finds the optimal
number of clusters by automatically removing groups which carry no information.
Moreover, our theory gives simple and efficient criterion to verify cluster
validity.
Although CEC can be build on an arbitrary family of densities, in the most
important case of Gaussian CEC:
{\em -- the division into clusters is affine invariant;
-- the clustering will have the tendency to divide the data into
ellipsoid-type shapes;
-- the approach is computationally efficient as we can apply Hartigan
approach.}
We study also with particular attention clustering based on the Spherical
Gaussian densities and that of Gaussian densities with covariance s \I. In
the letter case we show that with converging to zero we obtain the
classical k-means clustering
Extreme Entropy Machines: Robust information theoretic classification
Most of the existing classification methods are aimed at minimization of
empirical risk (through some simple point-based error measured with loss
function) with added regularization. We propose to approach this problem in a
more information theoretic way by investigating applicability of entropy
measures as a classification model objective function. We focus on quadratic
Renyi's entropy and connected Cauchy-Schwarz Divergence which leads to the
construction of Extreme Entropy Machines (EEM).
The main contribution of this paper is proposing a model based on the
information theoretic concepts which on the one hand shows new, entropic
perspective on known linear classifiers and on the other leads to a
construction of very robust method competetitive with the state of the art
non-information theoretic ones (including Support Vector Machines and Extreme
Learning Machines).
Evaluation on numerous problems spanning from small, simple ones from UCI
repository to the large (hundreads of thousands of samples) extremely
unbalanced (up to 100:1 classes' ratios) datasets shows wide applicability of
the EEM in real life problems and that it scales well
Paraconvex, but not strongly, Takagi functions
There is an important open problem in the theory of approximate convexity whether every paraconvex function on a bounded interval is strongly paraconvex. Our aim is to show that this is not the case. To do this we need the following generalization of Takagi function. For a sequence a = (ai)i∈N ⊂ R+ we consider Takagi-like function of the form T(a)(x) := ∑ ∞ i=1 aidist(x, 12i-1Z) for x ∈ R. We give convenient conditions for verification whether T(a) is paraconvex or strongly paraconvex. This enables us to construct a class of paraconvex functions which are not strongly paraconvex
LOSSGRAD: automatic learning rate in gradient descent
In this paper, we propose a simple, fast and easy to implement algorithm
LOSSGRAD (locally optimal step-size in gradient descent), which automatically
modifies the step-size in gradient descent during neural networks training.
Given a function , a point , and the gradient of , we aim
to find the step-size which is (locally) optimal, i.e. satisfies: Making use of quadratic
approximation, we show that the algorithm satisfies the above assumption. We
experimentally show that our method is insensitive to the choice of initial
learning rate while achieving results comparable to other methods.Comment: TFML 201
- …